在全球坐标系中,基于颜色的双手3D姿势估计在许多应用中至关重要。但是,很少有专门用于此任务的数据集,并且没有现有数据集支持在非实验室环境中的估计。这在很大程度上归因于3D手姿势注释所需的复杂数据收集过程,这也导致难以获得野生估计所需的视觉多样性水平的实例。为了实现这一目标,最近提出了一个大规模的数据集EGO2HANDS来解决野外双手分割和检测的任务。拟议的基于组成的数据生成技术可以创建具有质量,数量和多样性的双手实例,从而将其推广到看不见的域。在这项工作中,我们提出了EGO2Handspose,这是包含3D手姿势注释的EGO2HAND的扩展,并且是第一个在看不见域中启用基于颜色的两手3D跟踪的数据集。为此,我们开发了一组参数拟合算法以启用1)使用单个图像的3D手姿势注释,2)自动转换从2D到3D手势和3)具有时间一致性的准确双手跟踪。我们在多阶段管道上提供了增量的定量分析,并表明我们数据集中的培训达到了最新的结果,这些结果大大胜过其他数据集,以实现以自我为中心的双手全球3D姿势估计的任务。
translated by 谷歌翻译
由于深度神经网络成为密集预测任务的计算机视野中的最先进的方法,已经开发了许多方法,用于自动估计视觉输入的目标输出。尽管所提出的自动方法的估计准确性继续提高,但是对进一步校正所需的交互式改进。最近,已经提出了用于交互式分段任务的特征反向化细化方案(\ text {\ texit {f} -brs}),这使得能够有效优化插入掠夺网络中的一小组辅助变量以产生对象分段更好地与用户输入对齐。然而,所提出的辅助变量仅包含频道规模和偏置,限制了仅对全局细化的优化。在这项工作中,为了概括对广泛的密集预测任务的反向化改进,我们介绍了一组G-BRS(广义反向化改进方案)层,可以实现以下任务的全局和本地化精制:互动分段,语义分割,图像消光和单眼深度估计。 SBD的实验,城市景观,Mapillary Vista,Composit-1K和NYU-Depth-V2表明,我们的方法可以成功地推广和显着提高现有预用最先进的模型的性能,只点击几下。
translated by 谷歌翻译
在真正无约束的基于RGB的设置中的手部分割和检测对于许多应用来说都很重要。然而,由于大量分割和检测数据的手动注释的可行性,现有数据集远非充分的尺寸和变化。结果,目前的方法受许多潜在的假设有限,例如约束环境,呈肤色和照明。在这项工作中,我们展示了EGO2HANDS,这是一种基于大规模的基于RGB的自主手法分割/检测数据集,其是半自动注释的和基于颜色不变的合成的数据生成技术,能够创建具有大量和变化的培训数据。为了定量分析,我们手动注释了一个评估集,可显着超过现有的数量,多样性和注释精度的基准。我们提供跨数据集评估,并彻底分析了EGO2Hand上的最先进模型的性能,以表明我们的数据集和数据生成技术可以生成概括到未经域间适应的未经调整环境的模型。
translated by 谷歌翻译
We present Second Thought, a new learning paradigm that enables language models (LMs) to re-align with human values. By modeling the chain-of-edits between value-unaligned and value-aligned text, with LM fine-tuning and additional refinement through reinforcement learning, Second Thought not only achieves superior performance in three value alignment benchmark datasets but also shows strong human-value transfer learning ability in few-shot scenarios. The generated editing steps also offer better interpretability and ease for interactive error correction. Extensive human evaluations further confirm its effectiveness.
translated by 谷歌翻译
The ability to distinguish between different movie scenes is critical for understanding the storyline of a movie. However, accurately detecting movie scenes is often challenging as it requires the ability to reason over very long movie segments. This is in contrast to most existing video recognition models, which are typically designed for short-range video analysis. This work proposes a State-Space Transformer model that can efficiently capture dependencies in long movie videos for accurate movie scene detection. Our model, dubbed TranS4mer, is built using a novel S4A building block, which combines the strengths of structured state-space sequence (S4) and self-attention (A) layers. Given a sequence of frames divided into movie shots (uninterrupted periods where the camera position does not change), the S4A block first applies self-attention to capture short-range intra-shot dependencies. Afterward, the state-space operation in the S4A block is used to aggregate long-range inter-shot cues. The final TranS4mer model, which can be trained end-to-end, is obtained by stacking the S4A blocks one after the other multiple times. Our proposed TranS4mer outperforms all prior methods in three movie scene detection datasets, including MovieNet, BBC, and OVSD, while also being $2\times$ faster and requiring $3\times$ less GPU memory than standard Transformer models. We will release our code and models.
translated by 谷歌翻译
Minimising the longest travel distance for a group of mobile robots with interchangeable goals requires knowledge of the shortest length paths between all robots and goal destinations. Determining the exact length of the shortest paths in an environment with obstacles is challenging and cannot be guaranteed in a finite time. We propose an algorithm in which the accuracy of the path planning is iteratively increased. The approach provides a certificate when the uncertainties on estimates of the shortest paths become small enough to guarantee the optimality of the goal assignment. To this end, we apply results from assignment sensitivity assuming upper and lower bounds on the length of the shortest paths. We then provide polynomial-time methods to find such bounds by applying sampling-based path planning. The upper bounds are given by feasible paths, the lower bounds are obtained by expanding the sample set and leveraging knowledge of the sample dispersion. We demonstrate the application of the proposed method with a multi-robot path-planning case study.
translated by 谷歌翻译
Many real-world applications of language models (LMs), such as code autocomplete and writing assistance, involve human-LM interaction, but the main LM benchmarks are non-interactive, where a system produces output without human intervention. To evaluate human-LM interaction, we develop a framework, Human-AI Language-based Interaction Evaluation (H-LINE), that expands non-interactive evaluation along three dimensions, capturing (i) the interactive process, not only the final output; (ii) the first-person subjective experience, not just a third-party assessment; and (iii) notions of preference beyond quality. We then design five tasks ranging from goal-oriented to open-ended to capture different forms of interaction. On four state-of-the-art LMs (three variants of OpenAI's GPT-3 and AI21's J1-Jumbo), we find that non-interactive performance does not always result in better human-LM interaction and that first-person and third-party metrics can diverge, suggesting the importance of examining the nuances of human-LM interaction.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
The correct functioning of photovoltaic (PV) cells is critical to ensuring the optimal performance of a solar plant. Anomaly detection techniques for PV cells can result in significant cost savings in operation and maintenance (O&M). Recent research has focused on deep learning techniques for automatically detecting anomalies in Electroluminescence (EL) images. Automated anomaly annotations can improve current O&M methodologies and help develop decision-making systems to extend the life-cycle of the PV cells and predict failures. This paper addresses the lack of anomaly segmentation annotations in the literature by proposing a combination of state-of-the-art data-driven techniques to create a Golden Standard benchmark. The proposed method stands out for (1) its adaptability to new PV cell types, (2) cost-efficient fine-tuning, and (3) leverage public datasets to generate advanced annotations. The methodology has been validated in the annotation of a widely used dataset, obtaining a reduction of the annotation cost by 60%.
translated by 谷歌翻译
While skin cancer classification has been a popular and valuable deep learning application for years, there has been little consideration of the context in which testing images are taken. Traditional melanoma classifiers rely on the assumption that their testing environments are analogous to the structured images on which they are trained. This paper combats this notion, arguing that mole size, a vital attribute in professional dermatology, is a red herring in automated melanoma detection. Although malignant melanomas are consistently larger than benign melanomas, this distinction proves unreliable and harmful when images cannot be contextually scaled. This implementation builds a custom model that eliminates size as a training feature to prevent overfitting to incorrect parameters. Additionally, random rotation and contrast augmentations are performed to simulate the real-world use of melanoma detection applications. Several custom models with varying forms of data augmentation are implemented to demonstrate the most significant features of the generalization abilities of mole classifiers. These implementations show that user unpredictability is crucial when utilizing such applications. The caution required when manually modifying data is acknowledged, as data loss and biased conclusions are necessary considerations in this process. Additionally, mole size inconsistency and its significance are discussed in both the dermatology and deep learning communities.
translated by 谷歌翻译